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HUMAN MUTATOR GENE hMSH2 AND HEREDITARY NON 
POLYPOSIS COLORECTAL CANCER 



This invention was made using U.S. government grants from the NIH 
CA47527, CA09320, GM26449, CA09243, CA41183, CA42705 CA57435, and 
CA35494, as well as grants from the Department of Energy DOE/ERN/F139 and 
DE-FG 09291ER-61139. Therefore the U.S. government retains certain rights to 
the invention. 

This application is a continuation-in-part of Serial No. 08/056,546, filed 
May 5, 1993. 

TECHNICAL FIELD OF THE INVENTION 

The invention relates to a gene which predisposes individuals to colorectal 
and other cancers. In addition, it also relates to biochemical tests which can be 
used to identify drugs for treatment of affected individuals. 

BACKGROUND OF THE INVENTION 

HNPCC (Lynch syndrome) is one of the most common cancer 
predisposition syndromes, affecting as many as 1 in 200 individuals in the western 
world (Lynch et aL, 1993). Affected individuals develop tumors of the colon, 
endometrium, ovary and other organs, often before 50 years of age. Although the 
familial nature of this syndrome was discovered nearly a century ago (Warthin et 
al., 1913), the role of heredity in its causation remained difficult to define (Lynch 
et al., 1966). Recently, however, linkage analysis in two large kindreds 
demonstrated association with polymorphic markers on chromosome 2 (Peltomaki 
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et al., 1993a). Studies in other families suggested that neoplasia in a major 
fraction of HNPCC kindreds is linked to this same chromosome 2p locus (Aaltonen 
etal., 1993). 

HNPCC is defined clinically by the occurrence of early-onset colon and 
other specific cancers in first degree relatives spanning at least two generations 
(Lynch et al., 1993). The predisposition is inherited in an autosomal dominant 
fashion. It was initially expected that the gene(s) responsible for HNPCC was a 
tumor suppressor gene, as other previously characterized cancer predisposition 
syndromes with this mode of inheritance are caused by suppressor gene mutations 
(reviewed in Knudson, 1993). But the analysis of tumors from HNPCC patients 
suggested a different mechanism. Most loci encoding tumor suppressor genes 
undergo somatic losses during tumorigenesis (Stanbridge, 1990). In contrast, both 
alleles of chromosome 2p loci were found to be retained in HNPCC tumors 
(Aaltonen et al., 1993); During this search for chromosome 2 losses, however, 
it was noted that HNPCC tumors exhibited somatic alterations of numerous 
microsatellite sequences. 

Widespread, subtle alterations of the cancer cell genome were first detected 
in a subset of sporadic colorectal tumors using the arbitrarily-primed polymerase 
chain reaction (Peinado et al., 1992). These alterations were subsequently founc* 
to represent deletions of up to 4 nucleotides in genomic polyA tracts (Ionov et al., 
1993). Other studies showed that a similar, distinctive subgroup of sporadic 
tumors had insertions or deletions in a variety of simple repeated sequences, 
particularly microsatellite sequences consisting of dinucleotide or trinucleotide 
repeats (Ionov et al., 1993; Thibodeau et al., 1993; Aaltonen et al., 1993). 
Interestingly, these sporadic tumors had certain features in common with those 
developing in HNPCC kindreds, such as a tendency to be located on the right side 
of the colon and to be near-diploid. These and other data suggested that HNPCC 
and a subset of sporadic tumors were associated with a heritable defect causing 
replication errors (KER) of microsatellites (Ionov et al., 1993; Aaltonen et al., 
1993). 
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The mechanism underlying the postulated defect could not be determined 
from the study of tumor DNA, but studies in simpler organisms provided an 
intriguing possibility (Levinson and Gutman, 1987; Strand et aL, 1993). This 
work showed that bacteria and yeast containing defective mismatch repair genes 
manifest instability of dinucleotide repeats. The disruption of genes primarily 
involved in DNA replication or recombination had no apparent effect on the 
fidelity of microsatellite replication (reviewed in Kunkel, 1993). These pivotal 
studies suggested that defective mismatch repair might be responsible for the 
microsatellite alterations in the tumors from HNPCC patients (Strand et al. , 1993). 

Thus there is a need in the art to identify the actual gene and protein 
responsible for hereditary non-polyposis colorectal cancer and the replication error 
phenotype found in both hereditary and sporadic tumors. Identification of the gene 
and protein would allow more widespread diagnostic screening for hereditary non- 
polyposis colorectal cancer than is currently possible. Identification of the 
involved gene and protein would also enable the rational screening of compounds 
for use in drug therapy of hereditary non-polyposis colorectal cancer, and would 
enable gene therapy for affected individuals. 

SUMMARY OF THE INVENTION 

It is an object of the invention to provide a DNA molecule which when 
mutated is the genetic determinant for hereditary non-polyposis colorectal cancer. 

It is another object of the invention to provide DNA molecules which 
contain specific mutations which cause hereditary non-polyposis colorectal cancer. 

It is yet another object of the invention to provide methods of treating 
persons who are predisposed to hereditary non-polyposis colorectal cancer. 

It is still another object of the invention to provide methods for determining 
a predisposition to cancer. 

It is a further object of the invention to provide methods for screening test 
compounds to identify therapeutic agents for treating persons predisposed to 
hereditary non-polyposis colorectal cancer. 
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It is still another object of the invention to provide a protein which is 
important for human DNA mismatch repair. 

It is yet another object of the invention to provide a transgenic animal for 
studying potential therapies for hereditary non-polyposis colorectal cancer. 

These and other objects of the invention are provided by one or more of 
the embodiments described below. In one embodiment of the invention an isolated 
and purified DNA molecule is provided. The molecule has a sequence of at least 
about 20 nucleotides of hMSH2, as shown in SEQ ID NO:l. 

In another embodiment of the invention an isolated and purified DNA 
molecule is provided. The DNA molecule has a sequence of at least about 20 
nucleotides of an hMSH2 allele found in a tumor wherein said DNA molecule 
contains a mutation relative to hMSH2 shown in SEQ ID NO:l. 

In yet another embodiment of the invention a method of treating a person 
predisposed to hereditary non-polyposis colorectal cancer is provided. The method 
prevents accumulation of somatic mutations. The method involves administering 
a DNA molecule which has a sequence of at least about 20 nucleotides of hMSH2, 
as shown in SEQ ID NO:l, to a person having a mutation in an hMSH2 allele 
which predisposes the person to hereditary non-polyposis colorectal cancer, 
wherein said DNA molecule is sufficient to remedy the mutation in an HMSH2 
allele of the person. 

In another embodiment of the invention a method is provided for 
determining a predisposition to cancer. The method involves testing a body 
sample of a human to ascertain the presence of a mutation in hMSH2 which affects 
HMSH2 expression or hMSH2 protein function, the presence of such a mutation 
indicating a predisposition to cancer. 

In still another embodiment of the invention a method is provided for 
screening to identify therapeutic agents which can prevent or ameliorate tumors. 
The screening method involves contacting a test compound with a purified hMSH2 
protein or a cell; determining the ability of the hMSH2 protein or the cell to 
perform DNA mismatch repair, a test compound which increases the ability of said 
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HMSH2 protein or said cell to perform DNA mismatch repair being a potential 
therapeutic agent. 

In another embodiment of the invention an isolated and purified protein is 
provided. The protein has the sequence shown in SEQ ID NO:2. 

In still another embodiment of the invention a transgenic animal is 
provided. The transgenic (nonhuman) animal maintains an hMSH2 allele in its 
germline. The HMSH2 allele is one which is found in humans having hereditary 
non-polyposis colorectal cancer or in RER+ tumors. Also provided are animals 
which have no wild-type MSH2 alleles, due to mutations introduced. 

Thus the present invention provides the art with the sequence of the gene 
responsible for hereditary non-polyposis colorectal cancer and information 
regarding the mechanism by which it causes tumors. This enables the art to 
practice a variety of techniques to identify persons at risk of developing a variety 
of cancers and to treat them to prevent such cancers from actually developing. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 summarizes the markers retained in somatic cell hybrids used in 
locating the hMSH2 geneT 

PGR was used ^determine whether each of the listed markers was present 
(black box) or absent (white box)in the indicated hybrid. The laboratory name of 
each hybrid and the formal name (in parentheses) is listed. The hybrid panel was 
also validated with ten additional polymorphic markers outside of the 136-177 
region. M: hybrid derived from microcell-mediated chromosome 2 transfer; T: 
derived from t(X;2) translocation; X: derived from X-irradiated chromosome 2 
donor. MO: mouse-human hybrid; HA: hamster-human hybrid; RA: rat-human 
hybrid; TEL: telomere; CEN: centromere. 

Figure 2 shows a FISH analysis which was used to determine the proximity 
and ordering of DNA sequences within chromosome band 2pl6. 

Panels 2A and 2B show FISH mapping of the 123 marker. Panel 2A shows 
G-banded metaphase chromosome 2. Panel 2B shows identical chromosome as in 
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Panel 2A following FISH with a biotin-labeled PI clone for the 123 marker. 
Results localize the 123 marker to chromosome band 2pl6.3. Panels 2C and 2D 
show co-hybridization documenting the coincident localization of a micrgdis^ction 
(MicroFISH) probe from chromosome 2pl6 and the 123 n^k^^Pan^^^piows 
DAPI stained metaphase chromosome 2. Panel 2D shows simultaneous 
hybridization of the biotin-labeled 123 probe (appearing as an intensely staining 
smaller circle) and the Spectrum-Orange labeled 2pl6 Micro-FISH probe 
(appearing as a diffusely staining larger circle). Panel 2E shows a representative 
example of an interphase nucleus simultaneously hybridized with PI clones for 
CAS, hMSH2 and ze3. The results were used to directly measure the distances 
between markers in order to establish the order and relative distance between 
markers (after Trask et al., 1989). Inset: The image processing program NIH 
Image was used to provide an average gray value displayed as a surface plot to 
support the length me < ;urements and to graphically illustrate the relative order 
information. Th> surface plot presented defines the specified interphase 
chromosome and the relative order CA-Af5H2-ze3. 

Figure 3 shows linkage analysis of HNPCC pedigrees. 
" All affected individuals in which meiotic recombination occurred between 
markers 119 and 136 are included. A black box indicates that the individual did 
not contain the allele associated with disease in his/her family or that the individual 
inherited an allele not associated with disease from his/her affected parent. A 
white box indicates that the individual had an allele which was the same size as the 
disease-associated allele. A hatched circle indicates that the marker was not 
studied. All individuals had colon or endometrial cancer at less than 55 years of 
age, or had progeny with such disease but did not indicate that the patient 
necessarily had disease-associated alleles because phase could usually not be 
determined. 

Figure 4 shows hMSH2 gene localization. 

Southern blots containing EcoRI (figures 4A and 4C) and PstI (figures 4B 
and 4D) digested DNA from the indicated somatic cell hybrids (figures 4A and 4B) 
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or YAC clones (figures AC and 4D) were hybridized with a radiolabelled insert 
from cDNA clone pNP-23. Southern blotting and hybridization were performed 
as described (Vogelstein et al., 1987). Autoradiographs are shown. The 5.0 kb 
PstI fragment in hybrids Zll and Z12 is derived from hamster DNA. 
Figure 5 shows the cDNA sequence of hMSH2. 

An open reading frame (ORF) begins at nucleotide 1 and ends at nt 2802. 
The predicted amino acid sequence is shown. The sequence downstream of nt 
2879 was not determined. 

Figure 6 shows homology between yeast and human MSH2 genes. 

The predicted amino acid sequences of yeast (y) MSH2 (Reenan and 
Kolodner, 1992) and human MSH2 genes are compared within the region of 
highest homology. Blocks of similar amino acids are shaded. 

Figure 7 shows germline and somatic mutations of HMSH2. 

Autoradiographs of polyacrylamide gels containing the sequencing reactions 
derived from PCR products are shown. The 1.4 kb PCR products containing a 
conserved region of hMSH2 were generated from genomic DNA samples as 
described in the Examples. Antisense primers were used in the sequencing 
reactions. The ddA mixes from each sequencing reaction were loaded in adjacent 
lanes to facilitate comparison, as were those for C, G, and T. The DNA samples 
were derived from the tumor (lane 1) and normal colon (lane 2) of patient CxlO, 
an RER- colon tumor cell line (lane 3), and lymphocytes of patients J-42 (lane 4) 
and J-143 (lane 5). Figure 7A: A transition (C to T at codon 622) in lymphocyte 
DNA can be observed in HNPCC patients J-42 and J-143. Figure 7B: A transition 
(C to T at nt codon 639) in tumor (lane 1) and normal colonic mucosa (lane 2) of 
patient CxlO. Figure 7C: A substitution of a TG dinucleotide for an A at codon 
663 can be observed in DNA of the tumor of patient CxlO, (lane 1), but not in 
DNA from her normal colon (lane 2). Arrows mark the substitutions in panel A 
and B and the TG dinucleotide insertion site in panel C. 
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DETAILED DESCRIPTION OF THE PREFERRED ENraom\fENTS 

The disclosure of Serial No. 08/056,546, filed May 5, 1993, is expressly 
incorporated herein. 

It is a discovery of the present invention that the gene responsible for 
hereditary non-polyposis colorectal cancer is hMSH2 9 a human analog of bacterial 
MutS. The cDNA sequence of hMSH2 is shown in SEQ ID NO:l. This gene 
encodes a DNA mismatch repair enzyme. Mutation of the gene causes cells to 
accumulate mutations. For example, the observed replication error phenotype 
(RER + ) found in both sporadic and hereditary non-polyposis colorectal cancer 
consists of variations (insertions and deletions) in microsatellite DNA. In yeast and 
bacteria defective AfwrS-related genes cause other types of mutations as well. 

Useful DNA molecules according to the invention are those which will 
specifically hybridize to HMSH2 sequences. Typically these are at least about 20 
nucleotides in length and have the nucleotide sequence as shown in SEQ ID NO: 1 . 
Such molecules can be labeled, according to any technique known in the art, such 
as with radiolabels, fluorescent labels, enzymatic labels, sequence tags, etc. 
According to another aspect of the invention, the DNA molecules contain a 
mutation whi^lias been found in tumors of HNPCC patients or in sporadic RER + 
tumors. Su ? :ii molecules can be used as allele-specific oligonucleotide probes to 
track a particular mutation through a family. 

According to some aspects of the invention, it is desirable that the DNA 
encode all or a part of the hMSH2 protein as shown in SEQ ID NO:2. To obtain 
expression of the protein the DNA sequence can be operably linked to appropriate 
control sequences, such as promoter, Kozak consensus, and terminator sequences. 

A person who is predisposed to develop cancers due to inheritance of a 
mutant HMSH2 allele can be treated by administration of a DNA molecule which 
contains all or a part of the normal hMSH2 gene sequence as shown in SEQ ID 
NO: 1. A portion of the gene sequence will be useful when it spans the location 
of the mutation which is present in the mutant allele, so that a double 
recombination event between the mutant allele and the normal portion "corrects" 
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the defect present in the person. A portion of the gene can also be usefully 
administered when it encodes enough of the protein to express a functional DNA 
mismatch repair enzyme. Such a portion need not necessarily recombine with the 
mutant allele, but can be maintained at a separate locus in the genome or on an 
independently replicating vector. Means for administering DNA to humans are 
known in the art, and any can be used as is convenient. A variety of vectors are 
also known for this purpose. According to some techniques vectors are not 
required. Such techniques are well known to those of skill in the art. 

Also contemplated as part of the present invention is the use of a combined 
antineoplastic therapy regimen. Such a combined regimen is useful for patients 
having an RER+ tumor, whether sporadic or associated with HNPCC. The 
regimen combines any standard antineoplastic therapy to which a patient can 
become resistant and hMSH2 gene therapy, as described above. By remedying the 
defect present in RER+ cells, Le. , an hMSH2 mutation, the likelihood of the tumor 
developing a resistance mutation is greatly diminished. By delaying or preventing 
the onset of resistance, the life of cancer patients can be prolonged. In addition, 
such prevention of resistance allows a greater degree of tumor destruction by the 
therapeutic agent. Examples of antineoplastic therapies which can be combined 
with hMSH2 gene therapy are hormones, radiation, cytotoxic drugs, cytotoxins, 
and antibodies. 

Body samples can be tested to determine whether the hMSH2 gene is 
normal or mutant. Mutations are those deviations from the sequence shown in 
SEQ ID NO:l which are associated with disease and which cause a change in 
hMSH2 protein function or expression. Such mutations include nonconservative 
amino acid substitutions, deletions, premature terminations and frameshifts. See 
Table I. Suitable body samples for testing include those comprising DNA, RNA, 
or protein, obtained from biopsies, blood, prenatal, or embryonic tissues, for 
example. 

Provided with the information that the defect causing HNPCC and sporadic 
RER + tumors is in a DNA mismatch repair enzyme, one can perform assays on 
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test compounds and compositions to determine if they will remedy the defect. 
Such therapeutic compounds could bind to missense hMSH mutant proteins to 
restore the proteins to the normal, active conformation. Alternatively such 
therapeutic compounds could stimulate the expression of alternate pathways for 
mismatch repair. Screening for such therapeutic compounds could be performed 
by contacting test compounds with cells, either normal or those with an hMSH2 
mutation found in a tumor. The ability of the cells which were contacted with the 
test compounds is compared with the ability of the same cells which were not 
contacted for mismatch repair activity. Such activity can be tested as is known in 
the art. See, for example, Levin son and Gutman, 1987, and Strand et al., 1993. 
Observation of changes in microsatellite DNA in cells is one way of assessing 
mismatch repair activity. Another approach is to assay DNA mismatch repair in 
vitro in nuclear extracts. See Holmes, 1990; Thomas, 1991; and Fang, 1993. 
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Provided with the cDNA sequence and the amino acid of hMSH2 protein, 
one of ordinary skill in the art can readily produce HMSH2 protein, isolated and 
purified from other human proteins. For example, recombinant cells or organisms 
can be used to produce the protein in bacteria, yeast, or other convenient cell 
system. The isolated and purified protein can be used in screening for new 
therapeutic agents, for example, in in vitro assays of DNA mismatch repair. The 
protein can also be used to raise antibodies against HMSH2. Therapeutic 
administration of the protein is also contemplated. 

Transgenic animals are also contemplated by the present invention. These 
animals would have inserted in their germline hMSH2 alleles which are associated 
with HNPCC or sporadic tumors. Such animals could provide model systems for 
testing drugs and other therapeutic agents to prevent or retard the development of 
tumors. Also contemplated are genetically engineered animals which contain one 
or more mutations in their own MSH2 genes. The mutations will be engineered 
to correspond to mutations found in hMSH2 alleles which are found in HNPCC- 
affected individuals or in other human RER + tumors. Animals with both native 
MSH2 alleles inactivated and containing a human wild-type or mutant hMSH2 
allele are particularly desirable. 
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Example 1 

Somatic Cell Hybrids 

A panel of human-hamster, human-mouse, and human-rat hybrid cell lines 
was developed to facilitate HNPCC mapping. Hybrids containing only portions 
of chromosome 2 were obtained by microcell-mediated chromosome transfer or by 
standard cell fusions following X-irradiation of the chromosome 2 donor. 
Additionally, two hybrids were used which contained a (X;2)(q28;p21) 
translocation derived from human fibroblasts. In previous studies, the HNPCC 
locus was mapped to the 25 cM region surrounding marker 123 and bordered by 
markers 119 and 136 (Peltomaki et al., 1993a). Thirty-eight hybrids were screened 
with these three chromosome 2p markers. Eight of the hybrids proved useful for 
mapping the relevant portion of chromosome 2p. For example, hybrids LI and 
L2 contained the distal half of the region, including marker 123, while hybrid y3 
contained the half proximal to marker 123 (Figure 1). 
Methods 

Methods for the derivation of microcell-mediated chromosome 2 hybrids 
have been described previously (Chen et al., 1992; Spurr et al., 1993). Some 
hybrids were generated following fusion of X-irradiated donor cells containing 
human chromosome 2 to CHO cells (Chen et al., 1994). Mouse hybrids were 
derived by fusing HPRT deficient L cells (A9) with human fibroblasts (GM7503) 
containing a t(X;2)(q28,p21) translocation and selecting in media containing HAT. 
Example 2 

Polymorphic Markers 

To map more finely the HNPCC locus, additional polymorphic markers 
were obtained in three ways. First, a genomic clone containing 85 kb surrounding 
the 123 marker was used for fluorescence in situ hybridization (FISH) to localize 
it to chromosomal band 2pl6.3 (Figure 2A,B). The 2pl6 band region was then 
microdissected, and the sequences within this band were amplified using the 
polymerase chain reaction and subcloned into plasmid vectors (see Experimental 
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Procedures), The accuracy of the microdissection was confirmed using dual-color 
FISH by simultaneously hybridizing to microdissected material and a genomic 
clone containing marker 123 (Figure 2C,D). The subclones were screened by 
hybridization to a (CA) l5 probe, and hybridizing clones identified and sequenced. 
These sequences were then used to design oligonucleotide primers for PCR 
analysis of genomic DNA. Nineteen (CA) n repeat markers were identified in this 
way. Of these, four were highly polymorphic and mapped to the region between 
markers 119 and 136, as assessed by the somatic cell hybrid panel exhibited in 
Figure 1. Second, eight additional (CA) a markers, cloned randomly from human 
genomic DNA using a poly (CA) probe, were found to lie between markers 119 
and 136 by linkage analysis in CEPH pedigrees. Five of these were particularly 
informative and were used in our subsequent studies. Finally, one additional 
marker was identified by screening subclones of a genomic PI clone containing 
marker 123 with a (CA) 15 probe. Through these analyses, thirteen new 
polymorphic markers were identified in the 25 cM interval between markers 119 
and 136, resulting in an average marker spacing of —2 cM (Table II). These 
markers were mapped with respect to one another by linkage in CEPH and 
HNPCC pedigrees as well as by analysis of somatic cell hybrids. These two 
mapping techniques provided consistent and complementary information. For 
example, the relative positions of CA16 and CA18 could not be distinguished 
through linkage analysis but could be determined with the somatic cell hybrids LI , 
L2, and Y3. Conversely, the relative position of the ze3 and yh5 markers could 
not be determined through somatic cell hybrid mapping, but could be discerned by 
linkage analysis. 
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Methods 

All markers were obtained by screening human genomic libraries with 
radiolabelled (CA) n probes (Weber and May, 1989). The H T" markers (see Table 
1) were generated from a library made from total human genomic DNA, as 
described in Weissenbach et al M 1992. The "M H markers were made from 
libraries generated from microdissected chromosome 2pl6, as described below. 
The CA2 marker was generated from a library made from PI clone 210 digested 
to completion with Sau3 and cloned into the Xhol site of lambda YES (EUedge et 
al., 1991). The sequences of the clones obtained from these libraries were 
determined, and primers surrounding the CA repeats chosen. Only primers giving 
robust amplification and high heterozygosity were used for detailed analysis of 
HNPCC kindreds. All markers used in this study were shown to be derived from 
chromosome 2p by both linkage analysis in the CEPH pedigrees and evaluation in 
the somatic cell hybrid panel shown in Figure 1. The sequences of the primers 
and other details specific for each marker have been deposited with the Genome 
Data Base. Linkage analyses to obtain the map of the marker loci in CEPH 
families 1331, 1332, 1347, 1362, 1413, 1416, 884, and 102 (Weissenbach et al., 
1992) were performed using the CLINK program of the LINKAGE program 
package (Lathrop et al., 1984) with the no sex difference option and 11-point 
computations. The odds for the best locus order supported by the data were 
evaluated against pairwise inversions of the loci. 
Example 3 
Genomic Clones 

Many of the polymorphic markers shown in Table 1 were used to derive 
genomic clones containing 2pl6 sequences. Genomic clones were obtained by 
PCR screening of human PI and YAC libraries with these polymorphic markers, 
with ten additional sequence tagged sites (STS) derived from chromosome 2pl6 
microdissection, or with YAC junctions. Twenty-three PI clones, each containing 
85-95 kb, were obtained, as well as 35 YAC clones, containing 300 to 1800 kb. 
The YAC clones in some cases confirmed the linkage and somatic cell hybrid 
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maps. For example, markers ze3 and yh5 were both found in YAC's 4F4 and 1E1 
while CA16 and CA18 were both found in YAC 8E5, documenting their 
proximity. The highest density of genomic clones (28 YAC and 17 PI clones) was 
obtained between markers yb9 and yh5 (Table 1), which became the region most 
likely to contain the HNPCC gene during the course of these studies. The region 
between yh5 and yb9 was predicted to contain ~ 9 Mb (assuming 1 Mb/per cM). 
Based on the sizes of the YAC clones, and taking into account their chimerism, 
we estimated that they contained over 70% of the sequences between yh5 and yb9. 
Methods 

The markers described in Table 1 were used to screen YAC or PI libraries 
by PCR. The CEPH A library was obtained from Research Genetics, Inc. and 
consisted of 21,000 YAC clones, arrayed in a format allowing facile screening and 
unambiguous identification of positive clones. The sizes of ten of the YAC clones 
containing markers were determined by transverse alternating pulse-field gel 
electrophoresis using a GeneLine n apparatus from Beckman and found to average 
0.7 Mb (range 0.2-1.8 Mb). In some cases, inverse PCR was used to determine 
the YAC junctions (Joslyn et al., 1991), and the derived sequence used for 
"chromosome walking" with the YAC or PI libraiies. The junctions were also 
used to design primers to test whether the ends of the YAC clones could be 
localized to chromosome 2pl6 (and therefore presumably be non-chimeric). Three 
of four YAC clones which were tested in this way had both ends within the 
expected region of chromosome 2, as judged by analysis with the somatic cell 
hybrid panel. The human genomic PI library was also screened by PCR 
(Genome Systems, Inc.). PI clones M1015 and M1016, containing the hMSH2 
gene, were used to determine intron-exon borders using sequencing primers from 
the exons and SequiTherm™ polymerase (Epicentre Technologies). 
Example 4 

Analysis of HNPCC Families 

The markers described in Table 1 were then used to analyze six large 
HNPCC kindreds previously linked to chromosome 2p (Peltomaki et al., 1993a). 
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Two hundred thirteen individuals, including 56 members affected with colorectal 
or endometrial cancer, were examined. Four of the kindreds were from the United 
States, one from Newfoundland and one from New Zealand, To increase the 
number of affected individuals that could be examined, we obtained formalin-fixed, 
paraffin-embedded sections of normal tissues from deceased individuals and 
purified DNA from them (Goelz et al., 1985a). A single allele of each of the 
thirteen markers was found to segregate with disease in each of the six families 
(i.e., the allele was found in over 50% of affected individuals). No allele of any 
marker was shared among the affected members of more than three kindreds. 

Fourteen of the affected members contained only a subset of the expected 
alleles and therefore had undergone recombination between markers 119 and 136. 
Eleven of these individuals appeared to have simple, single recombination events. 
The most informative of these was in individual 148 from the J kindred and 
individual 44 from kindred 621 (Figure 3). Individual 621-44 apparently retained 
the disease linked allele at markers distal to CA5, while demonstrating multiple 
recombinants at more proximal loci, thus placing the CA5 marker at the proximal 
border of the HNPCC locus. Individual J-148 apparently retained the disease- 
linked allele at all markers proximal to yh5, while exhibiting recombinants at yh5 
and 119, thus placing the distal border at yh5. Assuming that the same gene was 
involved in both the J and 621 kindreds, the HNPCC gene was predicted to reside 
between markers CAS and yh5, an area spanning approximately 2 cM (Table 1). 

The DNA of three affected individuals (C-202, 4-156, 4-92) appeared to 
have undergone two recombinations in the area. There was probably one 
recombinant per generation, and this could be demonstrated in C-202 by analysis 
of DNA from his parents; in the other cases, parental DNA was not available. All 
three individuals retained disease-linked alleles at CAS and ze3 but not at more 
proximal and distal loci (Figure 3). Combined with the data from the patients with 
single recombinations, the double recombinants suggested that the HNPCC gene 
resided between CAS and ze3, a distance spanning less than 2 cM. 
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To determine the physical distances separating CAS, ze3, and yh5, 
metaphase and interphase FISH analysis was carried out. Dual-color FISH with 
PI clones containing these markers was performed with PI clones 820 and 838 
(containing the markers CAS and yhS, respectively) labeled with biotin and 
detected with fluorescein-labelled avidin, and clone 836 (containing marker ze3) 
labelled with Spectrum-Orange (Meltzer et al., 1992). The hybridization signals 
of these markers appeared coincident on metaphase chromosomes, confirming that 
they resided within an interval of <1.0 Mb. When FISH was performed on 
interphase nuclei, the relative positions of the three markers could be determined 
and the distances between them estimated (Trask et al., 1989). The results 
confirmed that the orientation of the markers was telomere-yh5-ze3-CA5- 
centromere (data not shown). Direct measurement of the distance between yhS 
and ze3 was estimated at <0.3 Mb, consistent with the presence of both of these 
markers on YAC clones 4E4 and 1E1. Measurements of 48 interphase 
chromosomes provided an estimate of the distance between ze3 and CAS at <0.8 
Mb, independently confirming the linkage data. 
Methods 

G-banded metaphase chromosomes were microdissected with glass 
microneedles and amplified by PCR as previously described (Guan et al., 1993 
Kao and Yu, 1991). For dual-color FISH, the PCR product was fluorochrome 
labelled (Spectrum-Orange, Imagenetics, Naperville, IL) or biotinylated in a 
secondary PCR reaction. PI clones were labelled by nick-translation or by 
degenerate oligonucleotide primers (Guan et al., 1993). FISH was carried out as 
previously described (Guan et al., 1993) and visualized with a Zeiss Axiophot 
equipped with a dual-bandpass filter. For analysis of interphase FISH patterns, the 
distance between hybridization signals was measured in a minimum of 24 nuclei 
(Trask et al., 1989). 
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Example 5 
Candidate Genes 

On the basis of the mapping results described above, we could determine 
whether a given gene was a candidate for HNPCC by determining its position 
relative to the CA5-ze3 domain. The first gene considered was a human homolog 
of the Drosophila SOS gene (reviewed by Egan and Weinberg, 1993). This gene 
transmits signals from membrane bound receptors to the ras pathway in diverse 
eukaryotes. It was considered a candidate because another ras-interacting gene, 
NF1, causes a cancer predisposition syndrome (Viskochil et al., 1990; Wallace et 
al., 1990), and SOS has been localized to chromosome 2pl6-21 by in situ 
hybridization (Webb et al., 1993). Using PCR to amplify SOS sequences from the 
hybrid panel, however, SOS was found to be distal to the CA5-yh5 domain 
(present in hybrid Z19 but not Z29). 

We next examined the interferon-inducible RNA activated protein kinase 
gene PKR. This gene has been shown to have tumor suppressor ability (reviewed 
by Lengyel, 1993) and to map close to 2pl6 (Hanash et al., 1993). We could not 
initially exclude PKR from the HNPCC domain, and therefore determined the 
sequence of its coding region in two individuals from HNPCC kindred C. Reverse 
transcript^ was used to generate cDNA from lymphoblastoid derived RNA of 
these two individuals, and PCR performed with primers specific for PKR. The 
PKR products were sequenced, and no deviations from the published sequence was 
identified within the coding region (Meurs et aL, 1990). Subsequent studies 
showed that the PKR gene was distal to the yh5 marker, and thus could be 
excluded as a basis for HNPCC. 

We then considered human homologs of the MutL and MutS mismatch 
repair genes previously shown to produce microsatellite instability in bacteria and 
yeast when disrupted (Levinson and Gutman, 1987; Strand et al., 1993). A human 
homolog of the yeast MurL-related gene PMS1 (Kramer et al., 1989) does not 
appear to reside on chromosome 2p (M. Liskay, personal communication). To 
identify homologs of MutS, we used degenerate oligonucleotide primers to PCR- 
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amplify cDNA from colon cancer cell lines. The same primers had been 
previously used to identify the yeast MSH2 gene on the basis of its MutS homology 
(Reenan and Kolodner, 1992). Under non-stringent conditions of PGR, a fragment 
of the expected size was obtained and these fragments were cloned into plasmid 
vectors. Most of the clones contained ribosomal RNA genes, representing 
abundant transcripts with weak homology to the degenerate primers. A subset of 
the clones, however, contained sequences similar to that of the yeast MSH2 gene, 
and one such clone, pNP-23, was evaluated further* The human gene from which 
this clone is derived is hereafter referred to as hMSH2. 

The insert from clone pNP-23 was used as a probe in Southern blots of 
somatic cell hybrid DNA. This insert hybridized to one or two fragments in 
human genomic DNA digested with PstI or EcoRI, respectively, and these 
fragments were present in hybrid Z30, containing most of human chromosome 2p. 
Analysis of other hybrids showed that the fragment was present in hybrids Zll, 
Z29, LI and L2, but not Z12, Y3 or Z19, thereby localizing the human MSH2 
QiMSH2) gene to a region bordered by markers CA18 and 1 19 (examples in Figure 
4). The YAC clones listed in Table 1 were then analyzed, and EcoRI and PstI 
fragments of the expected size identified in YAC 5A11, derived from screening 
the YAC library with the CAS marker (Figure 4). 

To confirm the Southern blots, we designed non-degenerate primers on the 
basis of the sequence of pNP-23. Several sets of primers were tested so that 
genomic DNA could be used as a template for PCR; an intervening intron 
prevented the original primers from being used effectively with templates other 
than cDNA. PCR with these primers was perfectly consistent with the Southern 
blot results. The expected 101 bp fragment was present in hybrids Z4, Zll, Z29, 
LI, and L2, and in YAC 5A11, but not in other hybrids or YAC clones (not 
shown). 

The localization of hMSH2 sequences to YAC 5 All demonstrated the 
proximity of these sequences to marker CAS. To determine the distance and 
relative orientation of HMSE2 with respect to CAS, we performed interphase FISH 
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analysis. PI clones containing CAS, ze3 and HMSH2 sequences (clones 820, 836, 
and M1015, respectively) were simultaneously hybridized to interphase nuclei 
following fluorescein and Spectrum Orange labelling (Meltzer et al., 1992), The 
results demonstrated that MSH2 resides within the HNPCC locus defined by 
linkage analysis to lie between CAS and ze3, and less than 0.3 Mb from marker 
CAS (Figure 2E). 

cDNA libraries generated from human colon cancer cells or from human 
fetal brain tissues were then screened with the insert of pNP-23 to obtain additional 
sequences from this gene. Seventy-five cDNA clones were initially identified and 
partially sequenced. PCR products representing the ends of the cDNA sequence 
contig were then used as probes to re-screen the cDNA libraries. This cDNA 
"walk" was repeated again with the new contig ends. Altogether, 147 cDNA 
clones were identified. The composite sequence derived from these clones is 
shown in Figure 5. An open reading frame (ORF) began 69 nt downstream of the 
5' end of the cDNA contig, and continued for 2802 bp. The methioiiine initiating 
this ORF was in a sequence context compatible with efficient translation (Kozak, 
1986) and was preceded by in-frame termination codons. RNA from placenta and 
brain were used in a PCR-based pi *)edure (RACE, Frohman et al., 1988) to 
independently determine the posiC of the 5' end of hMSH2 transcripts. This 
analysis demonstrated that the 5' ends of all detectable transcripts were less than 
100 bp upstream of the sequence shown in Figure 5, and were heterogeneous 
upstream of nt -69. The region of highest homology to the yeast MSH2 gene is 
shown in Figure 6. This region encompassed the helix-turn-helix domain perhaps 
responsible for MutS binding to DNA (Reenan and Kolodner, 1992). The yeast 
and human MSH2 proteins were 77% identical between codons 615 and 788. 
There were several other blocks of similar amino acids distributed throughout the 
length of these two proteins (966 and 934 amino acids in yeast and human, 
respectively). 
Methods 
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cDNA generated from the RNA of colorectal cancer cells with reverse 
transcriptase was used as template for PCR with the degenerate primers 5'-CTG 
GAT CCA C(G/A/T/C) G G(G/A/T/C)C C(G/A/T/C)A A<T/C)A TG-3' and 5'- 
CTG GAT CC(G/A) TA(G/A) TG(G/A/T/C) GT (G/A/T/C) (G/A)C(G/A) AA-3\ 
These two primers were used previously to identify the yeast MSH2 gene and were 
based on sequences conserved among related mammalian and bacterial genes 
(Reenan and Kolodner, 1992). The optimal PCR conditions for detecting the 
human MSH2 gene consisted of 35 cycles at 95° for 30 seconds, 41° for 90 
seconds, and 70° for 90 seconds, in the buffer described previously (Sidransky et 
aL, 1991), PCR products were cloned into T-tailed vectors as described (Holton 
and Graham, 1991) and sequenced with modified T7 polymerase (USB). The 
insert from one clone (pNP-23) containing human sequences homologous to the 
yeast MSH2 gene was then used to screen cDNA libraries generated from RNA of 
SW480 colon cancer cells (Clontech) or of fetal brain (Stratagene). After two 
further rounds of screening, positive clones were converted into plasmids and 
sequenced using modified T7 polymerase (Kinzler et aL, 1991). In some cases, 
the inserts were amplified using one hMSH2-specific primer and one vector- 
specific primer, and then sequenced with SequiTherm Polymerase (Epicentre 
Technologies). To determine the 5* end of MSH2 transcripts, RACE was 
performed (Frohman et aL, 1989) using brain and placenta cDNA (Clontech). 
Example 6 
Mutations of hMSH2 

The physical mapping of HMSH2 to the HNPCC locus was intriguing but 
could not prove that this gene was responsible for the disease. To obtain more 
compelling evidence, we determined whether germ line mutations of hMSH2 were 
present in the two HNPCC kindreds that originally established linkage to 
chromosome 2 (Peltomaki et aL, 1993a). Intron-exon borders within the most 
conserved region of hMSH2 (Figure 6) were determined by sequencing genomic 
PCR fragments containing adjacent exons. Genomic DNA samples from the 
lymphocytes of affected members of these two kindreds were then used as 
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templates for PGR to deteraiine the sequence of this domain. The DNA from 
individual J-42, afflicted with colon and endometrial cancer at ages 42 and 44, 
respectively, was found to contain one allele with a C to T transition at codon 622 
(CCA to CTA), resulting in a substitution of leucine for proline (Figure 7, top). 
Twenty additional DNA samples from unrelated individuals all encoded proline at 
this position. Twenty one members of the J kindred were then analyzed by direct 
sequencing of PGR products. All eleven affected individuals contained one allele 
with a C to T transition in codon 622, while all ten unaffected members contained 
two normal alleles, thus documenting perfect segregation with disease. 
Importantly, this proline was at a highly conserved position, the identical residue 
being found in all known MutS related genes from prokaryotes and eukaryotes 
(Figure 6 and Reenan and Kolodner, 1992). 

No mutations of the conserved region of MSH2 were identified in kindred 
C, so we next examined other parts of the KMSH2 transcript. RNA was purified 
from lymphoblastoid cells of patient C-202, a 27 year old male with colon cancer. 
Reverse transcriptase coupled PCR (RT-PCR) was used to generate four hMSH2- 
specific products encompassing codons 89 to 934 from this RNA (see Experimental 
Procedures). An abnormal, smaller RT-PCR product was identified with one of 
the primer pairs used. Mapping and sequencing studies; asing various MSH2 
primers showed that the abnormal product was the result of a presumptive splicing 
defect which removed codons 265 to 314 from the hMSH2 transcript. The 
abnormal transcript was found to segregate with disease in the C kindred, and was 
not found in twenty unrelated individuals. 

We next wished to determine whether hMSH2 was altered in one of the 
more recently linked families (R*P. and M.N-L., unpublished data) , and chose 
kindred 8 for detailed analysis. DNA and RNA were obtained from 
lymphoblastoid cells of 8-143, a 42 year old male with colon cancer. The 
conserved region of hMSH2 was amplified from genomic DNA using PCR and 
directly sequenced. A T to C substitution was noted in the polypyrimidine tract 
upstream of the exon beginning at codon 669 (at intron position -6). However, 
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this substitution was also found in two of twenty unrelated, normal individuals, and 
was therefore a polymorphism unrelated to the disease, with an allele frequency 
of 0.05. Most of the hMSH2 coding region was then amplified by RT-PCR, as 
described above, and no abnormal transcripts were detected* Sequencing of the 
PCR products, however, revealed a C to T transition at codon 406 (CGA to TGA) 
causing substitution of a termination codon for an arginine residue. RNA was 
available from the lymphocytes of a second affected member of kindred 8, and the 
same stop codon was identified. This alteration was not found in twenty other, 
unrelated individuals. 

Finally, we wished to determine whether mutations of this gene occurred 
in RER + tumors from patients without evident family histories of cancer. The 
conserved region of MSH2 was studied in four colorectal tumor cell lines from 
such patients using genomic DNA as templates for PCR. One tumor (from patient 
CxlO) was found to contain two KMSH2 alterations. The first was a C to T 
transition in codon 639 (CAT to TAT), resulting in a substitution of tyrosine for 
histidine. This change was not found in any of twenty samples from unrelated 
individuals, but was present in the DNA from normal colon of this patient, and 
was therefore likely to represent a germ line change (Figure 7, middle). Like the 
missense mutation in the J kindred, the CxlO alteration was at a position perfectly 
conserved in all MutS homologs (Figure 6 and Reenan and Kolodner, 1992). The 
second alteration in the tumor from CxlO was a substitution of a GT dinucleotide 
for an A in codon 663 (ATG to TGTG). The resultant one bp insertion was 
predicted to cause a frameshift, producing a termination codon 36 nt downstream. 
This mutation was demonstrated in both RNA and DNA purified from the CxlO 
tumor, but was not present in the patient's normal colon, so represented a somatic 
mutation (Figure 7, bottom). The PCR products from CxlO were cloned and 
sequenced, and the insertion mutation at codon 663 and the transition at codon 639 
were shown to reside on different alleles. 
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Methods 

To detect mutations, PCR products were generated from cDNA and human 
genomic DNA templates, then sequenced directly using SequiTherm™. In some 
cases, the PCR products were cloned into T-tailed vectors for sequencing to 
confirm the direct sequencing data. The primers used to amplify the conserved 
region of MSH2 from genomic DNA were 5'-CCA CAA TGG ACA CTT CTG 
C-3* and 5'-CAC CTG TTC CAT ATG TAC G-3\ resulting in a 1.4 kb fragment 
containing HMSH2 codons 614 to 705, and primers 5'-AAA ATG GGT TGC AAA 
CAT GC-3' and 5'-GTG ATA GTA CTC ATG GCC C-3\ resulting in a 2.0 kb 
fragment containing MSH2 cDNA codons 683 to 783. Primers for RT-PCR were 
5'-AGA TCT TCT TCT GGT TCG TC-3' and 5'-GCC AAC AAT AAT TTC 
TGG TG-3* for codons 89 to 433, 5'-TGG ATA AGA ACA GAA TAG AGG-3* 
and 5'-CCA CAA TGG ACA CTT CTG C-3' for codons 350-705, 5'-CAC CTG 
TTC CAT ATG TAC G-3> and 5'-AAA ATG GGT TGC AAA CAT GC-3' for 
codons 614 to 783, and 5'-GTG ATA GTA CTC ATG GCC C-3' and 5'-GAC 
AAT AGC TTA TCA ATA TTA CC-3' for codons 683-949. 

Discussion 

Three major conclusions can be drawn from the examples described here. 
First, physical mapping and linkage analysis localized the HNPCC locus on 
chromosome 2 to an 0.8 Mb segment bordered by markers CAS and ze3. Second, 
a new human homolog of the yeast MSH2 gene was identified, and this gene 
shown to lie in the same 0.8 Mb interval. Third, alterations of the hMSH2 gene 
occurred in the germ line of patients with RER + tumors, with or without classical 
HNPCC, and additional somatic alterations of this gene occurred in tumors 
(Summarized in Table I). The alterations were- at highly conserved regions or 
significantly altered the expected gene product and thus represent mutations with 
important functional effects. These results indicate that mutations of HMSH2 are 
responsible for HNPCC and the RER + positive phenotype found in tumors. 
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These data have substantial implications for understanding the neoplastic 
disease observed in HNPCC. In particular, they suggest that the microsatellite 
alterations previously observed in tumors from these patients are not 
epiphenomena, but are intrinsically related to pathogenesis. Additionally, the 
mutations observed in yeast and bacteria with defective MutS-related genes are not 
confined to insertions and deletions at simple repeated sequences, though these 
sequences provide convenient tools for analysis (Modrich, 1991). Similarly, one 
would expect that many mutations, in addition to microsatellite insertions or 
deletions, would be found in HNPCC tumors. This could lead to the multiple, 
sequential mutations in oncogenes and tumor suppressor genes which have been 
shown to drive colorectal tumorigenesis (Fearon and Vogelstein, 1990). Thus, the 
molecular pathogenesis of HNPCC tumors is likely to be similar to that occurring 
in non-HNPCC cases, though accelerated by the increased rate of mutation 
associated with mismatch repair defects. Accordingly, colon tumors from HNPCC 
patients have been shown to contain mutations of APC, p53 and RAS at 
frequencies similar to those found in sporadic colorectal cancers (Aaltonen et al., 
1993). 

Colorectal tumors from HNPCC patients are distinguished by their 
relatively normal cytogenetic composition (Kouri et al., 1990), and sporadic, 
RER + tumors have been demonstrated to have substantially fewer chromosome 
losses than those occurring in RER" cases (Thibodeau et al., 1993; Aaltonen et al., 
1993). These data suggest that genetic heterogeneity is critical for colorectal 
cancer development, but can be generated in two different ways (Thibodeau et al., 
1993). Most commonly, it develops through gross alterations resulting in 
aneuploidy, as suggested nearly eighty years ago (Boveri, 1914). In HNPCC- 
derived tumors and RER + sporadic tumors, the diversity is presumably more 
subtle, consisting of multiple small sequence changes distributed throughout the 
genome. The latter mechanism of generating diversity may be less dangerous to 
the host, as HNPCC patients, as well as patients with RER+ sporadic tumors, 
appear to have a better prognosis than would be expected from histopathologic 
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analysis of their tumors (Ionov et aL,; 1993; Thibodeau et aL, 1993; Lothe et aL, 
1993; Lynch etal., 1993). 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 2947 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



GGCGGGAAAC 


AGCTTAGXGG 


GTGTGGGGTC 


GCGCAXXXXC 


TTCAACCAGG 


AGGTGAGGAG 


60 


GTTTCGACAT 


GGCGGTGCAG 


CCGAAGGAGA 


CGCTGCAGTT 


GGAGAGCGCG 


GCCGAGGTCG 


120 


GCTXCGXGCG 


CTTCTTTCAG 


GGCATGCCGG 


AGAAGCCGAC 


CACCACAGTG 


CGCCXXXTCG 


180 


ACCGGGGCGA 


CTTCTATACG 


GCGCACGGCG 


AGGACGCGCX 


GCTGGCCGCC 


CGGGAGGXGX 


240 


TCAAGACCCA 


GGGGGXGATC 


AAGTACATGG 


GGCCGGCAGG 


AGCAAAGAAT 


CXGCAGAGTG 


300 


XXGXGCXXAG 


TAAAATGAAT 


XXXGAAXCTT 


T7GTAAAAGA 


TCTTCTTCTG 


GTTCGXCAGT 


360 


AXAGAGTTGA 


AGTTTATAAG 


AAXAGAGCXG 


GAAATAAGGC 


ATCCAAGGAG 


AATGATTGGT 


420 


ATXXGGCATA 


TAAGGCTTCT 


CCTGGCAATC 


TCTCTCAGTT 


TGAAGACATT 


CXCXXXGGTA 


480 


ACAAXGAXAT 


GTCAGCTTCC 


ATIGGTGTTG 


XGGGTGTXAA 


AATGTCCGCA 


GTTGAXGGCC 


540 


AGAGACAGGT 


TGGAGTTGGG 


TATGTGGATT 


CCATACAGAG 


GAAACTAGGA 


CTGTGTGAAT 


600 


TCCCTGATAA 


TGATCAGTTC 


TCCAATCTTG 


AGGCTCTCCT 


CATCCAGATT 


GGACCAAAGG 


660 


AAXGXGXTTT 


ACCCGGAGGA 


GAGACTGCTG 


GAGACATGGG 


GAAACTGAGA 


CAGAXAAXXC 


720 


AAAGAGGAGG 


AATTCTGAXC 


ACAGAAAGAA 


AAAAAGCTGA 


CTTTTCCACA 


AAAGACAXTT 


780 


ATCAGGACCT 


CAACCGGTTG 


TTGAAAGGCA 


AAAAGGGAGA 


GCAGATGAAT 


AGTGCXGTAT 


840 


TGCCAGAAAX 


GGAGAATCAG 


GTTGCAGXTT 


CAXCACTGTC 


TGCGGTAATC 


AAGXXTTXAG 


900 


AACXCTXAXC 


AGATGATTCC 


AACTTTGGAC 


AGTTTGAACT 


GACXACXXTX 


GACTTCAGCC 


960 


AGTATATGAA 


ATTGGATATX 


GCAGCAGTCA 


GAGCCCTTAA 


CCTTTTTCAG 


GGTTCXGXXG 


1020 


AAGATACCAC 


TGGCTCTCAG 


TCXCXGGCTG 


CCTTGCTGAA 


TAAGTGTAAA 


ACCCCTCAAG 


1080 


GACAAAGACT 


TGTTAACCAG 


TGGATTAAGC 


AGCCXCTCAT 


GGATAAGAAC 


AGAAXAGAGG 


1140 


AGAGATTGAA 


XTXAGTGGAA 


GCTTXXGXAG 


AAGATGCAGA 


ATTGAGGCAG 


ACTTTACAAG 


1200 


AAGATTTACT 


TCGTCGATXC 


CCAGATCTTA 


ACCGACTTGC 


CAAGAAGTTX 


CAAAGACAAG 


1260 


CAGCAAACXT 


ACAAGATTGX 


XACCGACXCX 


ATCAGGGXAX 


AAATCAACTA 


CCTAAXGXXA 


1320 


TACAGGCTCT 


GGAAAAACAT 


GAAGGAAAAC 


ACCAGAAATT 


ATTGTTGGCA 


GXXXXTGTGA 


1380 


CTCCTCTTAC 


TGATCTTCGT 


TCTGACTTCT 


CCAAGTTTCA 


GGAAATGATA 


GAAACAACTX 


1440 


TAGATATGGA 


XCAGGTGGAA 


AACCAXGAAX 


TCCTTGXAAA 


ACCTTCAXXX 


GATCCTAAXC 


1500 


TCAGXGAATT 


AAGAGAAAXA 


ATGAATGACT 


TGGAAAAGAA 


GATGCAGTCA 


ACAXTAAXAA 


1560 


GTGCAGCCAG 


AGATCTTGGC 


TXGGACCCTG 


GCAAACAGAT 


XAAACTGGAT 


TCCAGTGCAC 


1620 


AGTTTGGATA 


XXACTTTCGT 


GTAACCTGXA 


AGGAAGAAAA 


AGTCCTXCGX 


AACAAXAAAA 


1680 


ACXXTAGTAC 


TGTAGATATC 


CAGAAGAATG 


GTGXXAAATT 


TACCAACAGC 


AAATXGACTT 


1740 


CTXTAAATGA 


AGAGTATACC 


AAAAATAAAA 


CAGAATATGA 


AGAAGCCCAG 


GATGCCATXG 


1800 
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TTAAAGAAAT 


TGTCAATATT 


TCTTCAGGCT 


ATGTAGAACC 


AATGCAGACA 


CTCAATGATG 


1860 


TGTTAGCTCA 


GCTAGATGCT 


GTTGTCAGCT 


TTGCTCACGT 


GTCAAATGGA 


GCACCTGTTC 


1920 


CATATGTACG 


ACCAGCCATT 


TTGGAGAAAG 


GACAAGGAAG 


AATTATATTA 


AAAGCATCCA 


1980 


GGCATGCTTG 


TGTTGAAGTT 


CAAGATGAAA 


TTGCATTTAT 


TCCTAATGAC 


GTATACTTTG 


2040 


AAAAAGATAA 


ACAGATGTTC 


CACATCATTA 


CTGGCCCCAA 


TATGGGAGGT 


AAATCAACAT 


2100 


ATATTCGACA 


AACTGGGGTG 


ATAGTACTCA 


TGGCCCAAAT 


TGGGTGTTTT 


GTGCCATGTG 


2160 


AGTCAGCAGA 


AGTGTCCATT 


GTGGAC7GCA 


TCTTAGCCCG 


AGTAGGGGCT 


GGTGACAGTC 


2220 


AATTGAAAGG 


AGTCTCCACG 


TTCATGGCTG 


AAATGTTGGA 


AACTGCTTCT 


ATCCTCAGGT 


2280 


CTGCAACCAA 


AGATTCATTA 


ATAATCATAG 


ATGAATTGGG 


AAGAGGAACT 


TCTACCTACG 


2340 


ATGGATTTGG 


GTTAGCATGG 


GCTATATCAG 


AATACATTGC 


AACAAAGATT 


GGTGCTTTTT 


2400 


n p a 'VdTTTfzr* 

uwniu ±. A. X \j\*> 


Ai\^^»v#AJL X X X 


\~n. X \aJt\£\\s X in 




f a n tp a n& *r & 

VrAh X X A 


^nnuiui jl x*x 




ATAATCTACA 


TGTCACAGCA 


CTCACCACTG 


AAGAGACCTT 


AACTATGCTT 


TATCAGGTGA 


2520 


AGAAAGGTGT 


CTGTGATCAA 


AGTTTTGGGA 


TTCATGTTGC 


AGAGCTTGCT 


AATTTCCCTA 


2580 


AGCATGTAAT 


AGAGTGTGCT 


AAACAGAAAG 


CC CTGG AACT 


TGAGGAGTTT 


CAGTATATTG 


2640 


GAGAATCGCA 


AGGATATGAT 


ATCATGGAAC 


CAGCAGCAAA 


GAAGTGCTAT 


CTGGAAAGAG 


2700 


AGCAAGGTGA 


AAAAATTATT 


CAGGAGTTCC 


TGTCCAAGGT 


GAAACAAATG 


CCCTTTACTG 


2760 


AAATGTCAGA 


AGAAAACATC 


ACAATAAAGT 


TAAAACAGCT 


AAAAGCTGAA 


GTAATAGCAA 


2820 


AGAATAATAG 


CTTTGTAAAT 


GAAATCATTT 


CACGAATAAA 


AGTTACTACG 


TGAAAAATCC 


2880 


CAGTAATGGA 


ATGAAGGTAA 


TATTGATAAG 


CTATTGTCTG 


TAATAGTTTT 


ATATTGTTTT 


2940 


ATATTAA 












2947 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 934 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: YES 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Val Gin Pro Lys Glu Thr Leu Gin Leu Glu Ser Ala Ala Glu 
15 10 15 
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Val Gly Phe Val Arg Phe Phe Gin Gly Met Pro Glu Lys Pro Thr Thr 
20 25 30 

Thr Val Arg Leu Phe Asp Arg Gly Asp Phe Tyr Thr Ala His Gly Glu 
35 40 45 

Asp Ala Leu Leu Ala Ala Arg Glu Val Phe Lys Thr Gin Gly Val lie 
50 55 60 

Lys Tyr Met Gly Pro Ala Gly Ala Lys Asn Leu Gin Ser Val Val Leu 
65 70 75 80 

Ser Lys Met Asn Phe Glu Ser Phe Val Lys Asp Leu Leu Leu Val Arg 
85 90 95 

Gin Tyr Arg Val Glu Val Tyr Lys Asn Arg Ala Gly Asn Lys Ala Ser 
100 105 110 

Lys Glu Asn Asp Trp Tyr Leu Ala Tyr Lys Ala Ser Pro Gly Asn Leu 
115 120 125 

Ser Gin Phe Glu Asp lie Leu Phe Gly Asn Asn Asp Met Ser Ala Ser 
130 135 140 

He Gly Val Val Gly Val Lys Met Ser Ala Val Asp Gly Gin Arg Gin 
145 150 155 160 

Val Gly Val Gly Tyr Val Asp Ser He Gin Arg Lys Leu Gly Leu Cys 
165 170 175 

Glu Phe Pro Asp Asn Asp Gin Phe Ser Asn Leu Glu Ala Leu Leu He 
180 185 190 

Gin He Gly Pro Lys Glu Cys Val Leu Pro Gly Gly Glu Thr Ala Gly 
195 200 205 

Asp Met Gly Lys Leu Arg Gin He He Gin Arg Gly Gly He Leu He 
210 215 220 

Thr Glu Arg Lys Lys Ala Asp Phe Ser Thr Lys Asp He Tyr Gin Asp 
225 230 235 240 

Leu Asn Arg Leu Leu Lys Gly Lys Lys Gly Glu Gin Met Asn Ser Ala 
245 250 255 

Val Leu Pro Glu Met Glu Asn Gin Val Ala Val Ser Ser Leu Ser Ala 
260 265 270 

Val He Lys Phe Leu Glu Leu Leu Ser Asp Asp Ser Asn Phe Gly Gin 
275 280 285 

Phe Glu Leu Thr Thr Phe Asp Phe Ser Gin Tyr Met Lys Leu Asp He 
290 295 300 

Ala Ala Val Arg Ala Leu Asn Leu Phe Gin Gly Ser Val Glu Asp Thr 
305 310 315 320 

Thr Gly Ser Gin Ser Leu Ala Ala Leu Leu Asn Lys Cys Lys Thr Pro 
325 330 335 

Gin Gly Gin Arg Leu Val Asn Gin Trp He Lys Gin Pro Leu Met Asp 
340 345 350 
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Lys Asn Arg lie Glu Glu Arg Leu Asn Leu Val GXu Ala Phe Val Glu 
355 360 365 

Asp Ala Glu Leu Arg Gin Thr Leu Gin Glu Asp Leu Leu Arg Arg Phe 
370 375 380 

Pro Asp Leu Asn Arg Leu Ala Lys Lys Phe Gin Arg Gin Ala Ala Asn 
385 390 395 400 

Leu Gin Asp Cys Tyr Arg Leu Tyr Gin Gly lie Asn Gin Leu Pro Asn 
405 410 415 

Val lie Gin Ala Leu Glu Lys His Glu Gly Lys His Gin Lys Leu Leu 
420 425 430 

Leu Ala Val Phe Val Thr Pro Leu Thr Asp Leu Arg Ser Asp Phe Ser 
435 440 445 

Lys Phe Gin Glu Met lie Glu Thr Thr Leu Asp Met Asp Gin Val Glu 
450 455 460 

Asn His Glu Phe Leu Val Lys Pro Ser Phe Asp Pro Asn Leu Ser Glu 
465 470 475 480 

Leu Arg Glu lie Met Asn Asp Leu Glu Lys Lys Met Gin Ser Thr Leu 
485 490 495 

He Ser Ala Ala Arg Asp Leu Gly Leu Asp Pro Gly Lys Gin He Lys 
500 505 510 

Leu Asp Ser Ser Ala Gin Phe Gly Tyr Tyr Phe Arg Val Thr Cys Lys 
515 520 525 

Glu Glu Lys Val Leu Arg Asn Asn Lys Asn Phe Ser Thr Val Asp He 
530 535 540 

Gin Lys Asn Gly Val Lys Phe Thr Asn Ser Lys Leu Thr Ser Leu Asn 
545 550 555 560 

Glu Glu Tyr Thr Lys Asn Lys Thr Glu Tyr Glu Glu Ala Gin Asp Ala 
565 1 570 575 

He Val Lys Glu He Val Asn He Ser Ser Gly Tyr Val Glu Pro Met 
580 585 590 

Gin Thr Leu Asn Asp Val Leu Ala Gin Leu Asp Ala Val Val Ser Phe 
595 600 605 

Ala His Val ser Asn Gly Ala Pro Val Pro Tyr Val Arg Pro Ala He 
610 615 620 

Leu Glu Lys Gly Gin Gly Arg He He Leu Lys Ala Ser Arg His Ala 
625 630 635 640 

Cys Val Glu Val Gin Asp Glu He Ala Phe He Pro Asn Asp Val Tyr 
645 650 655 

Phe Glu Lys Asp Lys Gin Met Phe His He He Thr Gly Pro Asn Met 
660 665 670 

Gly Gly Lys Ser Thr Tyr He Arg Gin Thr Gly Val He Val Leu Met 
675 680 685 
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Ala Gin He Oly Cys Phe Val Pro Cys Glu Ser Ala Glu Val Ser lie 
690 695 700 

Val Asp Cys lie Leu Ala Arg Val Gly Ala Gly Asp Ser Gin Leu Lys 
705 710 715 720 

Gly Val Ser Thr Phe Met Ala Glu Met Leu Glu Thr Ala Ser lie Leu 
725 730 735 

Arg Ser Ala Thr Lys Asp Ser Leu lie lie lie Asp Glu Leu Gly Arg 
740 745 750 

Gly Thr Ser Thr Tyr Asp Gly Phe Gly Leu Ala Trp Ala lie Ser Glu 
755 760 765 

Tyr lie Ala Thr Lys lie Gly Ala Phe Cys Met Phe Ala Thr His Phe 
770 775 780 

His Glu Leu Thr Ala Leu Ala Asn Gin lie Pro Thr Val Asn Asn Leu 
785 790 795 800 

His Val Thr Ala Leu Thr Thr Glu Glu Thr Leu Thr Met Leu Tyr Gin 
805 810 815 

Val Lys Lys Gly Val Cys Asp Gin Ser Phe Gly lie His Val Ala Glu 
820 825 830 

Leu Ala Asn Phe Pro Lys His Val lie Glu Cys Ala Lys Gin Lys Ala 
835 840 845 

Leu Glu Leu Glu Glu Phe Gin Tyr lie Gly Glu Ser Gin Gly Tyr Asp 
850 855 860 

lie Met Glu Pro Ala Ala Lys Lys Cys Tyr Leu Glu Arg Glu Gin Gly 
865 870 875 880 

Glu Lys lie lie Gin Glut £he Leu Ser Lys Val Lys Gin Met Pro Phe 
885 * 890 895 

Thr Glu Met Ser Glu Glu Asn lie Thr lie Lys Leu Lys Gin Leu Lys 
900 905 910 

Ala Glu Val He Ala Lys Asn Asn Ser Phe Val Asn Glu He He Ser 
915 920 925 

Arg He Lys Val Thr Thr 
930 

(2) INFORMATION FOR SEQ ID NO: 3: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CTGGATCCAC NGGNCCNAAY ATG 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B ) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CTGGATCCRT ARTGNGTNRC RAA 
(2) INFORMATION FOR SEQ ID NO: 5: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCACAATGGA CACTTCTGC 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CACCTGTTCC ATATGTACG 
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(2) INFORMATION FOR SEQ ID NO: 7 5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
<iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
AAAATGGGTT GCAAACATGC 20 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GTGATAGTAC TCATGGCCC 19 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 
(iii) HYPOTHETICAL : NO 
(iv) ANTI-SENSE: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
AGATCTTCTT CTGGTTCGTC 20 
(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANT I -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GCCAACAATA ATTTCTGGTG 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TGGATAAGAA CAGAATAGAG G 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENC CHARACTERISTICS: 

(A) I* 'i*;GTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: CDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



CCACAATGGA CACTTCTGC 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



19 
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(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE i NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CACCTGTTCC ATATGTACG 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AAAATGGGTT GCAAACATGC 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GTGATAGTAC TCATGGCCC 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
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(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GACAATAGCT TATCAATATT ACC 23 
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CLAIMS 

1. An isolated and purified DNA molecule having a sequence of at 
least about 20 nucleotides of hMSH2, as shown in SEQ ID NO:l. 

2. The DNA molecule of claim 1 which is cDNA. 

3. The DNA molecule of claim 1 which is labeled. 

4. The DNA molecule of claim 1 which is operably linked to a 
promoter sequence. 

5. The DNA molecule of claim 4 which upon transcription produces 
an RNA molecule having the sequence of native HMSH2 mRNA. 

6. An isolated and purified DNA molecule having a sequence of at 
least about 20 nucleotides of an HMSH2 allele found in a tumor wherein said 
DNA molecule contains a mutation relative to hMSH2 shown in SEQ ID 
NO:l. 

7. A method of treating a person predisposed to hereditary non- 
polyposis colorectal cancer to prevent accumulation of somatic mutations, 
comprising: 

administering a DNA molecule which has a sequence of at least 
about 20 nucleotides of hMSH2, as shown in SEQ ID NO:l, to a person having 
a mutation in an hMSH2 allele which predisposes the person to hereditary non- 
polyposis colorectal cancer, wherein said DNA molecule is sufficient to remedy 
the mutation in an HMSH2 allele of the person. 

8. The method of claim 7 wherein said DNA molecule encodes an 
hMSH2 protein as shown in SEQ ID NO:l. 

9. The method of claim 7 wherein said DNA molecule spans the 
mutation site in said allele and remedies the mutation by recombining with said 
mutant allele. 
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10. The method of claim 7 wherein said DNA molecule encodes a 
portion of hMSH2 protein which is sufficient to provide DNA mismatch repair 
function, 

11. A method of detennining a predisposition to cancer comprising: 
testing a body sample of a human to ascertan the presence of a 

mutation in HMSH2 which affects hMSH2 expression or hMSH2 protein function, 
the presence of such a mutation indicating a predisposition to cancer. 

12. The method of claim 11 wherein the sample is DNA. 

13. The method of claim 11 wherein the sample is RNA. 

14. The method of claim 11 wherein the sample is protein. 

15. The method of claim 11 wherein the sample is isolated from 
prenatal or embryonic cells. 

16. A method of screening to identify therapeutic agents which can 
prevent or ameliorate tumors, comprising; 

contacting a test compound with a cell; 

determining the ability of the cell to perform DNA mismatch 
repair, a test compound which increases the ability of said cell to perform DNA 
mismatch repair being a potential therapeutic agent. 

17.. The method of claim 16 wherein the cell contains an hMSH2 
mutation found in a tumor. 

18. A method of screening to identify therapeutic agents which can 
prevent or ameliorate tumors, comprising: 

contacting a test compound with an isolated and purfied hMSH2 

protein; 

detennining the ability of the HMSH2 protein to perform DNA 
mismatch repair, a test compound which increases the ability of said protein to 
perform DNA mismatch repair being a potential therapeutic agent. 

19. The method of claim 18 wherein the hMSH2 protein contains a 
mutation found in a tumor. 
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20. An isolated and purified protein which has the sequence shown in 
SEQ ID NO:2. 

21. A transgenic animal wherein the transgene is an hMSH2 allele. 

22. The transgenic animal of claim 21 wherein the allele is a mutant 
allele of HMSH2 found in a patient having hereditary nonpolyposis colorectal 
cancer or an RER + tumor. 

23. A non-human animal which contains no wild-type MSH2 allele. 

24. The transgenic animal of claim 21 wherein said animal contains 
no wild-type MSH2 allele. 

25. A method of treating a person having an RER + tumor to prevent 
accumulation of somatic mutations leading to resistance to an anti-cancer 
therapeutic agent, comprising: 

administering to a person having a tumor with an RBR + 
phenotype: (a) a DNA molecule which has a sequence of at least about 20 
nucleotides of hMSH2, as shown in SEQ ID NO:l, wherein said DNA molecule 
is sufficient to remedy a mutation in an HMSH2 allele of the person; and (b) an 
anti-neoplastic therapeutic agent. 

26. The method of claim 23 wherein the anti-neoplastic therapeutic 
agent is selected from the group consisting of: a hormone, radiation, a cytotoxic 
drug, a cytotoxin, and an antibody. 
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